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Abstract — We consider a new formulation of a class of syn- 
chronization error channels and derive analytical bounds and 
numerical estimates for the capacity of these channels. For the 
binary channel with only deletions, we obtain an expression for 
the symmetric information rate in terms of subsequence weights 
which reduces to a tight lower bound for small deletion proba- 
bilities. We are also able to exactly characterize the Markov- 
1 rate for the binary channel with only replications. For a 
channel that introduces deletions as well as replications of input 
symbols, we design two sequences of approximating channels 
with favourable properties. In particular, we parameterize the 
state space associated with these approximating channels and 
show that the information rates approach that of the deletion- 
replication channel as the state space grows. For the case of the 
channel where deletions and replications occur with the same 
probabilities, a stronger result in the convergence of mutual 
information rates is shown. The numerous advantages this new 
formulation presents are explored. 

Index Terms — Synchronization errors, deletions, insertions, 
replications, channel capacity. 



I. Introduction 

CHANNELS with synchronization errors have been famil- 
iar to information and coding theorists and practitioners 
alike ever since the advent of the digital information era. 
Although Dobrushin (2) established the coding theorem for 
such channels as early as 1967, tackling these channels in 
terms of estimating information rates and constructing codes 
with good performance have proved to be very tough. In the 
last decade, significant progress has been made in estimating 
achievable information rates for certain channels with syn- 
chronization errors. However, a coding scheme with provably 
"good" performance remains elusive thus far. 

In this paper, we start with Dobrushin's model of channels 
with synchronization errors, henceforth referred to as the 
synchronization error channel (SEC), and convert it into an 
equivalent channel with states. Using this alternative model, 
we construct a sequence of channels that "approximate" the 

A. R. Iyengar was with the Department of Electrical and Computer 
Engineering and the Center for Magnetic Recording Research, University of 
California, San Diego. He is now with Qualcomm Technologies Inc., Santa 
Clara CA 95051 USA (e-mail: ariyengar@qti.qualcomm.com). P. H. Siegel is 
with the Department of Electrical and Computer Engineering and the Center 
for Magnetic Recording Research, University of California, San Diego, La 
Jolla, CA 92093 USA (e-mail: psiegel@ucsd.edu). J. K. Wolf (deceased) was 
with the Department of Electrical and Computer Engineering and the Center 
for Magnetic Recording Research, University of California, San Diego, La 
Jolla, CA 92093 USA. 

This work was supported in part by the Center for Magnetic Recording 
Research and by the National Science Foundation under the Grant CCF- 
0829865. A summary of some of the results in Sections [Tj] through |IV| was 
presented at the 201 1 International Symposium on Information Theory (ISIT), 
St. Petersburg, Russia QJ. 



SEC and whose limit is the SEC. We use these approxi- 
mate channels to derive some results about information rates 
achievable over the SEC. Although the motivation behind the 
alternative model is straightforward, its use to obtain non- 
trivial bounds on the capacity of the SEC has, to the best of 
our knowledge, not been found in literature. While the present 
paper concerns only a few asymptotic results on information 
rates of the SEC, we think that the model presented here can 
be utilized to design codes for SECs in general. 

The remainder of this paper is organized as follows. In 
Section [IlJ we revisit Dobrushin's model of an SEC and 
recall the main results on capacity of SECs. Through much 
of the paper, we consider a special case of the generic SEC — 
the deletion, replication channel (DRC) — and construct an 
equivalent channel by viewing the DRC as a channel with 



states in Section III Under further special cases of channels 
with only deletions or only replications, we give some simple, 
non-trivial and sometimes tight bounds on the capcity in 
Sections [lV-A| and |IV-B| We then construct a sequence of finite 
state channels that approximate the DRC and establish certain 
properties of this sequence of channels that serve as bounds 



for the capacity of the DRC in Section V-A In Section VI 



we note the application of similar strategies to more general 
SECs, and we conclude with summary and remarks in Section 

IvTTl 

II. Synchronization Error Channels 

Remark 1 (Notation): Non-random variables are written as 
lowercase letters, e.g., n. We denote sets by double-stroke 
uppercase letters, e.g., X. We will reserve N, Z and R to 
denote the sets of natural numbers, integers and real numbers, 
respectively. Z + denotes the set of non-negative integers. We 
define 

[n] = {1,2,..- ,n},neN, 
[0] 4 0, 

A \{m,m+l,--- ,n], m<n, 
\m ■ n = < „ and 
1 1 |0, n<m. 

Z± m = {— m, — to + !,■•• , 0, 1, • • ■ , to} V to G Z + . 



For some n € N, we will let X™ denote the set of vectors of 
dimension n with elements from X. We will write x to denote a 
string, and A to denote the empty string. The length of a string, 
denoted |a;|, is the number of symbols in it, and by definition, 
|A| = 0. With some abuse of notation, we will use "vectors of 
dimension n" and "strings of length n" interchageably. The 
set of all strings of length n over the alphabet X is hence also 
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denoted X™, and X° = {A}. We write X to denote the set of 
all strings over the set X, i.e., 



X= [Jx* 



The bar " • " will denote the concatenation operation, so that 
x ■ y is the concatenation of strings x and y. 

Throughout the paper, we assume an underlying probability 
space (S, <S?, P) over which random variables, denoted by 
uppercase letters, e.g., X, are defined. Random vectors are 
denoted by uppercase letters with the multiset of indices as 
subscripts, e.g., = (Xi,X2,--- ,X n ), or Xy [n] when 

the multiset of indices is itself the elements of a random 
vector YLi, Random processes (assumed discrete-time) are 
denoted by script letters X, or subscripted by the set of natural 
numbers, X^. 

We will use the asymptotic notations O(-), o(-), uj(-) as in 
0.0. □ 

We start by defining the synchronization error channels as 
considered by Dobrushin |2|. 

Definition 1 (Memoryless SECs): Let X be a finite set. A 
memoryless synchronization error channel is specified by a 
stochastic matrix 

{q(y\x),y e Y.xel} 

where Y is the output alphabet. From the properties of a 
stochastic matrix, we have 



0<q(y\x)<l, Y,q(y\x)=lV x£X. 



(1) 



Further, we will assume that the mean value of the length 
of the output string arising from one input symbol is strictly 
positive and finite, i.e., 



For X[ n ] = (xi,X2,- •■ ,x n ) e X r ' 

(Vi,y 2 r • ■ >V n ) e Y "' we write 



(2) 



and 



qn(y [n ]\x [n ]) = Y[q(y l 



Let |/r n ] denote the concatenation of strings y, h i G [n]. Then 
the transition probabilities of the memoryless SEC are defined 

as 

Qn{y\x [n ])= Y 1n{y[ n }\x[ n ]) (3) 

for y g Y and irr n i € X™. The memoryless SEC is given by the 
triplet Q„ = (X, Q n ,Y), the input and the output alphabets, 
and the transition probabilities between input strings of length 
n and all output strings. □ 

Consider the sequence of memoryless SECs {Q n }$?Li- 
Then, we have the following. 



Theorem 2 (Capacity [2]): Let X[ n ] and Y denote the input 
and the output of the SEC Q„. Let 



1 



Then, 



C n = sup -I(X [n] ;Y). 
P(x [n] ) n 



C = lim C n — inf C n 

n— >oo n>l 



exists and is equal to the capacity of the sequence of SECs.B 

The quantity C represents the maximum rate at which infor- 
mation can be transferred over the SEC with vanishing error 
probability. Furthermore, the following result shows that, in 
estimating the capacity of the SEC, we can restrict ourselves 
to a subclass of possible input processes X. 



Proposition 3 (Markov Capacity f£y): Let Xj^\ be a sta- 
tionary, ergodic, Markov process over X. Then the capacity 
of the sequence {Q„}^ =1 is 



1 



C = sup lim —I(X\ n y,Y). 

The capacity is therefore the supremum of the rates achievable 
through stationary, ergodic, Markov processes Xm- B 

We will now give an example of a memoryless SEC. 
Throughout the paper, we will assume that the input alphabet 
for the SECs is X = {0, 1}, i.e., the channels considered are 
binary memoryless SECs. However, we note here that all the 
results in the paper can be straightforwardly extended to the 
case where X is any finite set. 

Example 4 (Deletion-Replication Channel (DRC)): 
Consider the binary SEC with X = Y = {0, 1} and the 
following stochastic matrix. 



]x) 



Pd, 
, Ptpf ' 



y = x l , V I > 1. 



Intuitively, we can think of pa as the deletion probability, 
p t as the transmission probability, and p r as the replication 
probability, i.e., when x G X is sent, it is either deleted with 
probability or transmitted and replicated (£— 1) times with 
probability Ptpf^ 1 for £ > 1, From ([T}, we get for p r < 1 



Pd 



or equivalently 



^Ptpi 1 =Pd 



Pt 



1 ~Pr 



= 1, 



Pt = (1 ~ j?d)(l — JPr)- 



(4) 



From |2]i, 

00 

o<^W- 1 = - 



Pt 



a-Pr) 2 



1 ~Pd 
1 - Pr 



< OO 



where we use Equation Hence (pd,Pr) <= [0, l) 2 - Note 
that when p r = 0, the DRC is the same as the binary deletion 
channel (BDC); and when p^ = 0, it is the binary replication 
channel (BRC), also referred to as the geometric binary sticky 
channel (5J. □ 
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Pn(y\X[ n ], Zq = 0) = P ( Z = Z -> Y = y\ X [n] = X[ n ], Z = 0) 

{z--M=\V\} 

= P(Z = z\Z = Q)P{Y = y\X [n] = x [n] ,Z = 0,Z = z) 

mz\=\y\} 

\V\ 

= ^ TT y( Z i = z i\ z i-\ = Zi-i,Z = 0)P(Yi = yi\X[ n ] = X[ n ],Zi = zi) ) (5) 

{?:|«| = |»|}i=l 
1171 

= n( P ( Z * = z i\ Z *-i = z *-i) 1 {y i =x i - H } 

{z:\z\ = \y\} t=l 



The BDC has been the most well-studied SEC. In |(6), the 
author surveys the results that were known prior to 2009. 
To summarize, the best known lower bounds were obtained, 
chronologically, through bounds on the cutoff rate for sequen- 
tial decoding [7|, bounding the rate with a first-order Markov 
input [8], reduction to a Poisson-repeat channel [9], analyzing 



a "jigsaw-puzzle" coding scheme 1 10], or by directly bounding 
the information rate by analyzing the channel as a joint 



renewal process fTT) . Recently, |12| and (13 1 independently 
gave the capacity of a BDC with small deletion probabilities, 
and showed that it is achieved by independent and uniformly 
distributed (i.u.d.) inputs. The known upper bounds for the 
BDC have been obtained by genie-aided decoder arguments 
| fl4| , p5) . An idea from p5| was extended to obtain some 
analytical lower bounds on the capacity of channels that in- 
volve substitution errors as well as insertions or deletions fl6) . 
The idea in fT2) was extended to obtain a better approximtion 
for the capacity of the BDC with small deletion probabilities 
in (17). 

In contrast to these existing results, our approach explicitly 
characterizes the achievable information rates in terms of 
"subsequence-weights", which is a measure relevant in ML 
decoding for the BDC (6). Additionally, the method proposed 
here gives the tight bound on capacity for small deletion 
probabilities obtained in |12| more directljQ 

For the BRC, |5| obtained lower bounds on the capacity 
by numerically estimating the capacity per unit cost of the 
equivalent channel of runs through optimization of 8 and 16 
bit codes. Here, we obtain direct analytical lower bounds on 
the capacity. These, to the best of our knowledge, represent the 
only analytical bounds for the capacity of the BRC. Moreover, 
we obtain an exact expression for the Markov- 1 rate for 
the BRC which conclusively disproves the conjecture that 
the capacity of SECs is a convex function of the channel 
parameter. 

We will use the DRC as a running example of an SEC. In 



Section VI we discuss the extension of the results presented 
to a more general class of SECs. 

III. DRC as a Channel with States 

We now construct a channel with states that is equivalent 
to the DRC introduced in Example [4] Dobrushin's model of 

'Note that although we obtain the same lower bound for the capacity of 
the BDC as in II 21. we do not prove a converse here. 



SEC (cf. Definition [T} tracks the output string generated by 
each input symbol. In our model, we track the input symbol 
that gave rise to each output symbol. 

A. Channel Model 

Definition 5 (DRC with states): For a fixed n E N, we 
write 

Yi = X Ti = Xi- Zi ,i € [N n ] (6) 

where Z.- L G Z is the "state" of the channel and 

N n = sup{i > : Ti < n\T = 0}. 

We will refer to the random variable N n as the output length 
corresponding to n input symbols. The state process Z is 
independent of the channel input process, and is a first-order 
Markov process over the set of integers Z with transition 
probabilities for each i € N given by 



P(Zi — Zi\Zi_\ 




1, V £ > 0, 



(7) 



where we define p t as in Equation Q assuming (j>d,Pr) G 
[0, l) 2 . We will refer to the process _T = where Tj = i—Zi 
as the index process. 

We also assume the boundary condition that Zq = Tq = 0, 
i.e., the channel is perfectly synchronized before transmission 
commences. Note that the transition probabilities in Q indeed 
are well-defined since V G Z, as p<j < 1, 



^2 l p{z i \z i -i) =p r + ^PtPd = Pr 



Pt 



/=() 



1 ~Pd 



= 1. 



With the above definition, for y E Y and e X™, the 
channel transition probabilities are given as in Equation ((5). 
Note that in the terms within the parenthesis on the right hand 
side of Equation (|5), the first term is completely specified by 
the transition probabilities |7| of the channel state process Z, 
and the second term is or 1 accordingly as ^ x i-z t or 
y t = x l - Zi respectively. 

For each n 6 N, we define the DRC with states as the 
channel P„ = (X,P„,Y). □ 

We will start by proving a few properties of the output length 
N n and the channel state Z and index processes r which will 
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be made use of subsequently. We will relegate the proofs of 
the following two lemmas to Appedices [I] and [II] respectively. 

Lemma 6 (Properties of N n ): The output length N n satis- 
fies the following properties: 

(i) For any n£N, N n < oo a.s.. 

(ii) N n — > oo as n — > oo a.s.. 

(iii) =* — > a.s. as n — > oo. ■ 

Lemma 7 (Properties ofZ,T): The channel state process 
Z and the index process r satisfy the following properties: 

(i) Z and r are first-order, time-homogeneous, shift- 
invariant Markov chains. Further, Z is irreducible and 
aperiodic. 

(ii) r is almost surely non-decreasing, i.e., 

T l+j > Ti V j > 0,i E N a.s.. 

For any n E N, a realization of Zr n ] such that the corre- 
sponding rr n i realization satisfies the above monotonicity 
property is called a compatible state path. 

(iii) For every i E N, 

Hiz^z^x) = ffCTiir^i) = /i 2 (p r ) + ^— —h 2 (p d ), 

1 - Pd 

where /i2(a;) — — xlog 2 a; — (1 — x)log 2 (l — x), for 
x & [0, 1], is the binary entropy function fl8) . Here, we 
assume from continuity that log 2 = 0. Consequently, 
for every n E N, 



H 



[Z[ n ]) = H(T [n] ) = n(h 2 (p r ) + 1 _^ fe 2(Pd)) 



Note that the Z process is not stationary because we fix Z = 

0. The r process is clearly not stationary since T,; depends on 

1. From Lemma [7] (ii), we can show that for 1 < n < m < oo, 



N n < N m a.s. 



(8) 



Proposition 8 (Channel Equivalence): For each n E N, the 
channels Q„ and P„ are equivalent. 

Proof: Both Q„ and P„ have the same input and output 
alphabets X and Y, respectively. The correspondence between 
the transition probabilities Q n and P n in Equations Q and 
<|3j is evident by the following observations: 

(i) For every parsing of y E Y as t/r n i in Equation (j3j, there 
is a corresponding state path z E Z in Equation (|5). 

(ii) For every compatible state path z E Z in Equation (|5]) 
(See Lemma |7]), there is a corresponding parsing of y E 
Y in Equation (|3). 

(iii) For these corresponding parsings of y and compatible 
state paths z, the terms within the parenthesis on the right 
hand side of Equation [5] when grouped according to the 
output symbols arising from the same input symbol, spell 
out exactly the same probability as the terms q^^xt). 

Therefore, except on a set of zero probability (state paths that 
are not compatible), the probability measures Q n and P n are 
equal. This implies the equivalence of the channels Q„ and 



As a consequence of the above equivalence, the results of 
Theorem [2] and Proposition [5] carry forward to the sequence 
of channels {P n }^ =1 specified by Equations |6| and Q. 

Corollary 9 (Dobrushin's results for {J' n ] c ^ =1 ): For input 
Xi n ] and output Y[ Nn ] of the channel P n , the quantity 

C= lim sup -I(X[ n y,Ym A 

n ^=°P(X|„,) n 

= sup lim -I(X [n] ;Y [N ]), 

X M n ^°° n 

where Xj^ represents stationary, ergodic, Markov processes 
over X, exists and is equal to the capacity of the sequence of 
channels {P n }» =1 . ■ 

We will henceforth restrict our attention to this class of input 
processes. The following is a useful result whose proof is 



deferred to Appendix III 



Proposition 10 (Stationarity): The channel output process 
y is stationary for stationary input processes X. ■ 

As a consequence of the above result, the entropy rate H(y) 
of the output process is well-defined fTS] . 

B. Bounds on the Capacity of the DRC 

The formulation of the DRC as a channel with states allows 
us to immediately establish the following. 

Proposition 11 (Simple bounds on C): For the DRC, 

h 2 (pr) 



(1-Pd)(l 



JtelPd) <C<1- Pd . 



1 ~Pr 

Proof: We can write 

J ) = ^(-^[n] ! Y[N n ] 5 Z [N n ] ) - I(X[n] 5 Z [N n ] \Y[N n ] ) 
I{^[n}',Y[N n ]\Z[N n }) - I(X[ n y, Z[N n ]\Y[N n ]) 



(a) 



(b) 



= ; (1 - Pd )H(X [n] ) - I(X [n] ; Z [Nn] \Y [Nn] ), 



(9) 



where (a) is true because X JL Z and (b) from the fact that 
the DRC, given the Z process realization, is equivalent to a 
binary erasure channel (BEC) with erasure rate p^. Then, 

»(1 ~Pd) > I(X [n] ;Y [Nn] ) > (l- Pd )H(X [n] ) - H(Z [Nn] ). 

From Lemma [7] (iii), and since, for any finite n, we have the 
extra knowledge that Zi > i — n by definition of N n , we can 
show that 

t — —h 2 (Pdi) 
1 ~Pd 

Note that the extra information Zi > i — n becomes tautolog- 



H(Z [Nn] )<E(N n )[h 2 (p r ) 



ical when n - 

H(Z [Nn 



lim 

n— f oo 



oo, and hence 

E(Aq 



( lim 

V 71— >QO 



(h 2 (p t ) 



1 -Pd 



-h 2 {pd 



From Lemma [6] and for independent uniformly distributed 
inputs, the claim follows. ■ 

gives bounds on the capacity for (pd;Pr) G 



11 



Proposition 

[0, l) 2 . Three special cases of the DRC are of particular 
interest: the binary deletion channel (BDC) with p^ = p,p r = 
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0; the symmetric deletion-replication channel (SDRC) with 
p d = p r = P\ and the binary replication channel (BRC) with 
Pd = 0,Pr = P- Specializing Proposition 11 to these cases 
gives us the following results. 

Corollary 12 (Bounds on C for special cases): We have 

1 - p - h 2 (p) < Cbdc <l-p, 
l- P -2h 2 (p) <C7 SD rc <1-P, 

1-^<C BRC <1. ■ 
1-p 



Although the bounds in Corollary 12 have simple closed 



form expressions with well known information theoretic func- 
tions, they are loose compared to the best known (analytical 
or numerical) bounds for the capacity of these channels. We 
can, however, improve these bounds. We have from Equation 

I{X [n] ;Y [Nn] ) = (1 - Pd )H(X [n] ) + I(Y [Nn] ;Z [Nn] ) 

- H(Z[ Nn ]) + H(Z{N n ]\X[ n ],Y[ Nn ]). 

(10) 

Writing the entropy rate of the input process X as H(X) and 
defining 

H(Z) = lim H(y) 4 lim 



n— >oo ft n— too fl 

H( Z[N n ]\X[ n ],Y[ Nn ]) 



and U(Z\X,y) = lim 



from Lemma [7] and Equation ( fT0| ), we can bound 



C > sup (1 - Pd )H(X) + H(Z\X, y) 
x v 

1 - Pd 



1 -Pr 



Lemma 13: Let iJ„ = i.ZJ (Zr^jl X[ n ]i ^[nS) f° r « £ 
Then, for the sequence {H n }^L 1 , 

H(Z\X,y) = lim H n = swpH n . ■ 

n-yoo n>1 



The proof is given in Appendix IV The above result implies 
that if we could evaluate (or lower bound) H n for some n, 
that could be used to estimate a lower bound on C. 

Proposition 14: For the DRC, 

c> S uj>(n(x) + W^l )(i- Pd ) 

X V l-p r J 

PA h 2 {p r ) - h 2 (p d ). 



1 - Pr 



Proof: We have 

= -H{Z[ Nn ]\X[ n ],Y[ Nn ]) 



i=l 

= — e( }^H(Zj\Zj-i = zj_i,X[j_i_ 2 ._ i:Tl ], y[i:iv„]) 



where the last equality follows from the conditional indepen- 
dence of on Z[i_ 2 ],X[i_2-z 4 _ 1 ] and F[j_i] given Zj_i. 
From the time-homogeneity and shift-invariance of the Z 
process (See Lemma |7J, as n — > oo, the summand in the 
above expression 



ff(Zi|Zi, 



Since 



us the desired result. 



^j-ll^i-l-Zj-i^J^iilJVn]) 

fl-(Zi|z = o,x N ,r N ) = fl-^i^y). 

^5^, optimizing over input processes X gives 



It is not easy to evaluate the bound in Proposition 14 How 



ever, we can further lower bound the capacity by introducing 
some conditioning. 

Lemma 15: The sequence of lower bounds {D*} < *L 1 , 
where 



D 



H{z x \Zi,x,y) 



l-Pr 



)(i-Pd) 



i - pd 



h 2 {p r ) - h 2 {p d ), i e N 



1 - Pr 

is non-decreasing. 

Proof: Since we have introduced extra conditioning, the 
Dfs are lower bounds. We have 

H(Z 1 \Z i+1 ) = H(Z 1 ,Z i \Z i+1 ) - H(Zi\Z u Z i+1 ) 

= H(Zi\Z i+ i) + H(Zi\Z[ i:i+1 j) — H(Zi\Zi, Z i+ x) 

( = } H(Zi\Z i+1 ) + H(Z 1 \Z i ) - H(Zi\Z u Z i+1 ) 
= H{Z 1 \Z l )+I{Z 1 ;Z l \Z l+1 ) 
>ff(Zi|Z<) 

where (a) follows from the Markovity of the Z process. Since 
conditioning on X and y preserves the above chain of inequal- 
ities, we have H(Zx\Z i+1 ,X,y) > R(Z x \Z h X,y) V i > 1. 
Hence {Df} ( *L 1 is non-decreasing. 

Optimizing D* over stationary, ergodic, Markov input 
processes X gives the bound in Proposition [TT] Therefore, for 
increasing i, we have bounds better than the one in Proposition 
11 In particular, as i — > oo, following the proof of Lemma [6] 



(iii), we can see that — > p {_^ a.s., so that the knowledge 
of Zi becomes tautological in the limit, and consequently, 

sup lim Df = sup sup Df 

x X i>l 



gives us the bound in Proposition 14 



Alternatively, instead of bounding the information rate as in 



Proposition 14 we can write the following as an immediate 
consequence of Equation ( fT0| ) and an argument similar to the 



one made in the proof of Proposition 14 



Proposition 16 (Information rates for the DRC): For the 
DRC, 



C 



sup 
x 



(< 



1 



p d )H{x)-u{z\y)+'H{z\x,y) j 

H(z 1 \x,y)-H(z 1 \y) ^ 

1 - Pr 



(l-p d ) sup [H{X) 

X 
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E 



i 

2m-fi 



nH(x[ m+i _i]), 



H(aj[ m+i _!]) = 
o 



m y[i-l] (^[m+i-l]) , , v 
/ m+ i_n l)l^[m+i-l])y[i-l]J: 
V m / 



(ii) 



-1] (^p-zira+i-l]) 1 



, V" n M, 5[2:,-l]( 3: [2-z:m+.-l]) / W S[ 2: , 

rj(l[m+*-l],tf[i-l]) = - 2^ M*!-.^}— Z (Z, ^ l0 S2 ( 1{*i-.= Vi}— 73 3 

^[m+i-l] J v u '»[i-i]l a; [m+i-l]J 



Following arguments similar to the ones used in Lemma 15 
we can show the following. 

Lemma 17: The sequence of lower bounds {7?*}°^, 
where 



is non-decreasing, and 



H(z 1 \z l ,x,y)-H(z 1 \y) 

l-Pr 



C = sup lim i? 4 = sup sup R { . 

x x i>l 



The task of finding the rate-maximizing input distributions 
appears to be tough, with no theoretical insight^] or efficient 
numerical algorithms. Often, to establish lower bounds on 
achievable rates, special classes of input processes are con- 
sidered, and we will resort to a similar strategy here to obtain 
some expressions for the bounds we have so far developed. 
The following section will consider special cases of the DRC 
wherein there are either only deletions, i.e., the BDC, or 
only replications, i.e., the BRC. In a subsequent section, the 
symmetric DRC will be studied. All bounds developed in the 
next section are similar to the generic bounds developed thus 
far. 

IV. Channels with Deletions or Replications 

For the case of the BDC or the BRC, evaluating some of the 
bounds developed in the previous section is somewhat easy, 
owing to the fact that the Z process is monotonic in these two 
special cases, i.e., it is non-increasing or non-decreasing with 
increments of at most one, respectively. This monotonicity in 
Z implies that the T process is strictly increasing for the BDC 
and non-decreasing with increments of at most one for the 
BRC. This translates to the output being a subsequence of the 
input sequence for the BDC and vice versa for the BRC. 

A. Information Rates for the BDC 

In this subsection we estimate the information rates possible 
over the BDC, i.e., p^ = p,p r = 0, when the input process is 
either i.u.d. or when it is a first-order Markov process. 

For the BDC with i.u.d. inputs, we can easily show that y 
is also an i.u.d. sequence. Consequently, 

n 



as n 



2 A new result on BDC with small deletion probability |17| provides a 
partial answer to this question. 



because the only information obtained from Y[pf n ] about Z\tf n \ 
is the length of the vector, and this information vanishes in 
the limit as n —> oo. Therefore, we have from Equation 
( p~0] > that the lower bound in Proposition 14 is actually the 
symmetric information rate (SIR). We are hence interested in 
evaluating D™ d as defined in Lemma 15 where the superscript 



"iud" stands for independent, uniformly distributed inputs. In 
particular, we have the SIR 



Cgg c = lim D? d = supD? d . 



(12) 



i>l 



We start with some definitions and notation. 



Definition 18 (Subsequence weights): We call a vector xa 
a subsequence of a vector xb if A C B and the order of the 
elements in A is the same as the order in which those elements 
appear in B. For ease of notation, we will write w V[i] (x^) to 
denote the number of subsequences of € X 3 that are the 
same as yui G X\ which is referred to as the y^-subsequence 
weight of the vector xyi. We can write 

w vw( x lj])= zZ 

where the elements of the set S are arranged in ascending 
order. Clearly, w y[i] (icyj) = for % > j. We define w\(x^) = 
1 V x m £ for j > 0. □ 

Definition 19 (Runs and run-lengths): For a binary se- 
quence, a run is a maximal block of contiguous 0s or Is. 
The run-length of a run is the number of symbols in it. We 
denote by r\(xuA the length of the first run in the vector 

x {j] eX 3 ,j>l. Clearly, 1 < n(x[j]) < \xy]\ = j. □ 

We will denote by Z], and the sets of non-decreasing 
and non-increasing vectors of length i, respectively, for i > 1. 

Theorem 20 (SIR for the BDC): For the BDC, 



a 



iud 
BDC 



1 -p- h 2 (p) 

+ (l-p)( lim V^, m p m (l-p)' 



m>0 



where 4 ( m+ j ^ 1 )^, with S$ = H{Zx\Zi = 

—m,X,y) is as given in Equation ( fTTj ). 

Proof: For the BDC, we have from Lemma [15] that 

D? d = l-p-h 2 (p) + (l- p)H(Z 1 \Z l ,X,y). 

From Equation ( p"2| ), we need to show that 



H{z x \Zi,x,y) 



m>0 



^, m p m {l-p) 1 
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We first note that 

H(Z 1 \Z i ,X,y)=H(Z 1 \Z i ,X [i _ Zi _ 1] ,Y [i _ 1] ). 

Clearly, the above entropy term is zero for i = 1, For i > 2, 
given Zi = -m,X [m+i _x] = £[ m+ i_x] and Y^xj = yp-i], it 
is easy to see that 

Z x e {ze {0,-1, 



, -m} : xi_ 2 = j/i, 

-ij ( x [2—z:m+i—l]) > 0}- 



That is, Z\ = z only if x\- z and j/i match, and the subsequent 
part of the output vector y\2-i-i] is a subsequence of the 
subsequent part of the input vector x^-z-.m+i-i]- Also, for 
Z[t-l] G ^ 1 (which, as noted earlier, is true for the BDC), 

P(^[t-i] = z \i-\\-,Zi = -m|Jf[ m+i _i] = »[ m+ t_i], 

^i-i] = - l{x [< _ 1] _. [i _ 1] = W[< _ 1] }J»tP!n 

where = 1 — p t = p, so that for > z > —m, 

P(Zi = z,Zi = -m\X[ m+i _x] = x[ m+i -i], Yji_i] = 



= 1; 



l -{x 1 - z =y 1 }PlpT ■ w y [2 :i-i] (. x [2-z:m+i-l])> 

P{Zi = -m|X[ m+i _xj = aj[ m+ i_i],F[j_i] = s/[»_i]) 

= (^m-H-llMP™ 

and hence, when w^.j^fm+i-i]) > 0, 



P(Zi 



-m, X 



m+i— 11 



'Fm+i— 11) F[i-1 



<[*-!](*[ 



] v*^[m+«— 1] 



tod 1 



4xj 



=9i} w !/[2:,-i] (^p-Xitm+i-l]) 



( x [m+i-i\) 

Since, with i.u.d. inputs, P(X [m+i _ 1] = X[ m+! _i] |Z; = 
-m) = 2-( m + 4 - 1 ) and 



P(5 / [i-l] = y[i-l]|-X"[m+*-l] = ^[m+i-l], 
^[i-i] \ x [m+i—l]) 



-m) 



fm+i— 1\ 
V m / 



we have that H(Z\\Zi = —to, A",} 7 ) = £)m as m Equation 
( fTTj i. By noting that 

'to + i — 1 n 
m 

from Equation |7} (with = p,p r = 0,p t = 1 — p), we have 
the desired result. ■ 

Although evaluating Sj m in general is hard since we are 
required to count subsequence weights of sequences, we can 
evaluate it in two specific cases: for every m when i = 2 (when 
all but a single bit are deleted) and for all i when m = 1 (when 
only a single bit is deleted). We examine these two cases in 



P(Zt = -m\Z = 0) 



p m (i- P y 



detail in Appendix V-A V-B and state the results here. 
Corollary 21 (Lower bound for C^ c ): For the BDC, 
4(1 -p) 3 



C^BDC — u 2 



ud 



> 



{2-pY 



h 2 {p) 



(i-p) 3 £ 



mp 



log 2 TO 



m>2 



Corollary 22 (Small deletion probability SIR): For 
BDC, 

C^ c = l+plog 2 p-d P + 0(p 2 ) 



the 



where d 1.154163765. ■ 
Similar bounds for symmetric first-order Markov input 
processes are considered in Appendix V-C Fig. [T] plots the 
bounds for Cbdc- 




Fig. 1. Bounds on the capacity for the BDC in bits per channel use as a 
function of the deletion probability p. D™ d (cf. Equation (16) ) is shown as the 
long-dashed blue line and C md (or equivalently S) 1 1 ud ) with the 0(p 2 ) term 
dropped as the solid red line (cf. Equation (19) ). The best known numerical 
lower 1 11| and upper bounds fT3j are shown as black and white circles 
respectively. The best known lower bound as p approaches 1 I9J is shown 
as the dash-dotted green line. The inset shows the bounds for small p values 
where the red solid curve is known to be tight from j!2| . 



B. Information Rates for the BRC 

In this subsection, we will consider information rates for the 
BRC, i.e., pd = 0,p r = p. As in the previous subsection, we 
will consider i.u.d. and symmetric first-order Markov inputs. 

For the BRC, the Z process is non-decreasing. Moreover, 
when it increases, the increment is at most 1 at each time 
instant. This simplifies the evaluation of information rates and 
we will, in fact, be able to write exact expressions for the 
Markov- 1 rates, as will be shown shortly. In this case, even 
when the input is i.u.d., the term 

I ( Y [N n ];Z [Nn] ) 

— - — - — /> as n — > oo 

n 

in the normalized version of Equation fL0| >. Hence the expres- 
sion for the information rate in Proposition 16 will prove to 
be more useful in this subsection. 



Theorem 23 (Markov-l Rates for the BRC): For the BRC, 
the Markov-l rate is given as in Equation ( fl"3| l. 
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= max \h 2 (a) + a£ ((1 - «) — )'(E (tV^)) - P+il ; a){1 - p) h 2 



i>i 



k>l 



1-p 



p+(l-a)(l-p) 



(13) 



Proof: First we note that since Zi e {0, 1}, we have 
H(Zx\X,y) = E[h 2 (P(Z 1 = | x N , m ))] and H{Z 1 \ y) = 
E[h 2 (P(Zi = | y N ))}. Further we have for the BRC that 
whenever Yi ^ Yi-i, we must have Zi = Z. L -\ or equivalently 
Ti = Ti-i + 1. This means that Z\ is independent of subse- 
quent runs of y (and X) given the first run of y (and X) since 
we can achieve synchronization at the end of each run. Thus 
we can write the conditional probabilities P(Z X = | X, y) 
and P(Zi = | y) in terms of the first runs of X and y, i.e., 
P(Zi = | is, m) = P(Zi = | n(x N ), ri(yn)) and 
P{Z 1 = I y N ) = P(Z 1 = | ri(ys)). Note that we assume 
that Zq = so that Yq = Xq. Thus, if x\ ^ Xq, then Z\ is 
or 1 accordingly as y\ is not equal or equal to yo, respectively. 
This means that there is no uncertainty in Z\ given the output 
sequence (and the assumption that xq = yo = 0, which can 
be made without loss of generality). Therefore, in estimating 
the entropy of Z\ given the output sequence, or the output 
and the input sequences, we can confine our attention to those 
sequences x^ and y® whose first runs are comprised of zeros. 
We shall denote such runs as rj(-). For a first-order Markov 
input process, we have, for I > 

P(r?(a*) = = (l-a) l a, 
and we can get from the definition of the BRC that 

P(r?(iw) = fc|r?(a*) = I) = (fj (1 - pJ'+V"' 
for k > I. Consequently, we have 



k-l 



P(r°M =k) = ^(l- a) 1 a •(HI- P Y +1 P 

= «(l-p)(p+(l-a)(l- J3 )) fc . 

Since Z\ = excludes the first bit in the received sequence 
from being a replication, we can easily obtain 



PGZi = 0|r?(a*) = f,r?(wO = *) 



for k > I + l {/=0 }. For k > 1, 
P(Z, = 0|r?( W ) = k) 



P(r?(y N ) - fc) 

Eto(i-»)'q(?)(i-P)' + V-'(i) 

(l-p)a(p+(l-a)(l-p)) 

(l-a)(l-p) 
p+(l-a)(l-p)' 



Therefore, 



H{Zx\X,y) = 



l>0 



E 

k>l+t {l=0} 



(1 - 
k 



(i-p) ,+ V-'/i a (f) 



a(l-p)V (1-a) 



1-p 



k>l 



and 



E 



a(l-p)(p + (l-a)(l-p)) 
(l-a)(l-p) 



x h- 



(; 



,p+(l-a)(l-p)> 

= (p + (1 - a)(l - p))ft 2 ( -j, \ n r) ■ 

vp+ (1 — a)(l — p)/ 

Substituting these in Proposition [16] specialized to the BRC 
and first-order Markov inputs, we have the desired result. ■ 



The following results are shown in Appendix VI 

Corollary 24 (Lower bound for C^ c ): For the BRC, 



C. 



Ml 
BRC 



> R^ 1 = hn 



1 



(l-p)(4f + l) 



for < p < p* 



2p 
1-p 
1 

r^p 

0.734675821. 



(1 -p)4P -p 
4? + 1 



4P 
4? + 1 



p(4P + 1) 



4' J 



Corollary 25 (Small replication probability SIR): For the 
BRC, 

Qrc = 1+Pl°g 2 p + rp + 0(p 2 ) 

where r w 0.845836235. ■ 

Fig.|2]plots these bounds. Note that the SIR and the Markov-1 
rate are non-convex in p. Further, it appears that the Markov- 1 
rate (and the SIR) are zero for some values of p < 1. However, 
this behavior is due to the fact that the term 



k>l 



in Equation ( p"3j ) is computed only up to a finite value of k (the 
curves in Fig. [2] are, therefore, lower bounds for the Markov-1 
rate and the SIR). For values of p close to 1, more terms in 
this sum need to be considered to get a better estimate of the 
achievable rates. 

Remark 2: It was expected that the capacity of a memory- 
less SEC was a convex function of the channel parameters. 
Although this conjecture seems to be true for the BDC |19|, 
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Fig. 2. Lower bounds on the capacity for the BRC. The bound R^ 1 from 
Corollary |24| is shown as the long-dashed blue line and the Markov-1 rate in 
Equation |l3) is shown as the solid red line. The SIR (a = | in Equation ^ 1 3) ) 
is the dash-dotted green line. The numerical lower bounds in [5] are shown as 
black circles. The inset shows the bounds for small replication probabilities. 



we see that this conjecture is false for the BRC. Note that 
the lower bounds in [5] themselves lead one to question the 
conjecture (cf. Fig. J2|. However, the Markov-1 rate for the 
BRC in Equation ( fl3| > settles this conjecture as being false 
for the BRC. This is because if the capacity were convex in 
the replication probability, no rate larger than (1 — p) would 
be achievable, which is clearly not the case as can be seen 
from Fig. [2] This implies that, in general, in presence of syn- 
chronization errors, the capacity is not convex in the channel 
parameters. In particular, it is possible that the capacity for 
the BDC is non-convex as well. 

V. Channels with Deletions and Replications 

Although the bounds in the previous section provide us 
some idea of the achievable information rates for the BDC 
and the BRC, they do not generalize in a straightforward 
manner for an SEC with both deletions and replication^] In 
order to obtain bounds when both deletions and replications 
are present, we take a different approach. 

A. Approximate Non-Stationary Channels 

We construct a sequence of channels that approximate the 
DRC P„. To this end, we fix to € Z + and let 




\Zi\ < m 
\Zi\ > m 



(14) 



3 It is possible to obtain, albeit with a lot more effort than in the cases of 
the BDC or the BRC, the lower bound for a first-order Markov input 
process for a DRC. We shall omit this here. 



for ZiS given by the Z process defined in Section [TTT] with 
sgn : R i-> {±1} defined as 



sgn(i) 



1, 



x > 



-1, x < 0. 



We then define the channel model for the to* -approximating 
channel PT m = (X, Y, m ) as was done for P„, 



Y- 



(m) 



x r 



X 



where N^ m) = sup{i > : T\ m > < n\T [ ™> = 0}. It is clear 
that n — m < N^ 1 < n + to. 

The input and output alphabets of the channel Pt m are 
X and Y respectively, same as those of P n . The transition 
probability PT m for the channel P? m is defined as in 
Equation (pj, but with the channel states defined by the process 
Z (m) A {^M}^. The transition probability of the 
process itself is defined by Equations ( fl4| > and 

We now establish a few properties of the ij( m ) process 
and the approximating channels Pjj m . We start with some 
properties of the state process Z^ m > and the index process 
r(" 1 ) that will be useful in proving subsequent results. The 
following property, proved in Appendix |VII| establishes the 
non-stationarity of the sequence of channels {P^ , m } m >o- 

Lemma 26 (Properties of Z^ m >): The state process Z^ is 
a finite, time-inhomogeneous Markov chain. Moreover, the 
boundary states {±m} are eventually absorbing states, under 
the measure P, in the following two cases. 

(i) For to = o(n) when ^ p r . 

(ii) For to = o(y/n) when p^ = p r . ■ 

Lemma 27 (Refinedness of {r^} m >o) : F° r a fixed n £ 
N, for every to e Z + , 



(m) 



-,(m) 



m)i 



and 



{r w }n[n]c{rg m)] }n[n] a.s. 



w n[Blc{r frI }nH a - ; 



where {ru} denotes the set of elements of the random vector 
I\r, i e., where the random variables are not repeated. ■ 



The proof is given in Appendix VIII 



Proposition 28: For every to G 



I(X [n y,Y [N J < I(X [nYl Y^l ), and 



^ H ;ySi»j</(x [n , ; rzJ 



(m) 



'IN, 



The proof is left to Appendix [IX] Intuitively, the above 
result is true because the "drift" between the input and the 
output processes is bounded by to for the approximating 
channel P^ m , whereas it is unbounded for the DRC P ra 
(or equivalently Q„). The result below, which gives a total 
ordering of the sequence of channels {P„ m }m>o m terms 
of their mutual information rates, follows immediately from 



Proposition 28 



10 
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Corollary 29 (Total Ordering of {P^ m } m >o)-' For any 
n G N, the sequence {/^ m } m >o> where 



is non-increasing. Since I\ m G [0, 1] V n G N and m G Z + , 
lim m _ i . 00 Jr TO exists and is equal to inf m >n l\ m . I 

Proposition 30 (Information limits): For any n € N, we 
have 

I n = -I(X [n] :Y [Nn] ) = 4 = lira I 4. m = inf 7+ 

77 m-+oo ' m>0 

Consequently, for a stationary, ergodic input process X, 



Ix 
so that 



lim I n 

n— f oo 



inf /„ 

n>l 



4 = lim 4 

n— >oo 



inf ZI . 



ra>l 



sup/^ 

AT 



sup 4 

X 



for stationary, ergodic, Markov processes X. 

Proof: The last equality in the first line is from Corollary 



29 From Proposition 28 we have I n < l\ m V m G Z + , from 
which I n < I n follows. The equality is true because of the 



following. If we let j^V, 



a({X [n] ,Y^ m) }), the sigma- 

r {m) 



algebra generated by the random variables {X[ n ] , Y^^}, 



then {^ n ,m\m>Q is & filtration |20 §10.1], i.e., 

^n,m C & n ,m+l V m > 0. 
Thus, P(X[ n ], yj^fm),) i s the restriction of P to ^ n ,m- From 



[21 Theorem 2], we have that /„ = jt 



The limit of 7 n as n goes to infinity exists and is equal 
to the infimum of the sequence from the subadditivity of the 



sequence {n/„}„>i and Fekete's Lemma (cf. (122 
II]). The last claim made is true from Proposition 

Corollary 31: For any n G N, we have that 

C n = sup J„ = C'l = sup l\ 

P{X[n\) p(x w ) 

where C n is as defined in Theorem [2] Therefore 

C = lim C n = inf C n = C f = lim (A. 

n— yoo n>\ n— J-oo 



Although {P^ m } m >o is a sequence of channels that approx- 
imate P n , and have the properties discussed so far in this 
subsection, they are not useful as finite-state channels (FSCs), 
as shown below. 

Lemma 32 (FSCs V\ m ): For any m G Z + , for a station- 
ary, ergodic input process X, 



so that 



4(m)4 lim n m =H(X) 



C*{m) 4 sup 4 M = 1. 

AT 



Proof From Lemma 26 the states {±m} are eventually 
absorbing for any m G Z^THence, in the limit as n — > oo, 
the channel only has a delay of ±m, and hence the result. ■ 

We now attempt to obtain approximate channels that are 
stationary and useful as FSCs. 

B. Approximate Stationary Channels 

Let m G Z + . Fix n G N. Consider the channel P* m — 
(X,Y,P*J where 

),ie [JV< m >] 



with N n m \ T^L, and Z^ m ) , as defined for the channel 

Pjj m . The difference will be in the underlying measure P( m ). 
Let the measure P{ m ) be such that the Z^™' process is a finite, 
time-homogeneous, first-order Markov chain with transition 
probabilities 



P(m) (z< m > = k|zn = i) = p(^ mj = k\zn = j) 



(m) 

when — m < j < m, 

(m) . 



,(m) 



(to) 



,(to) 



(to) 



(m) 



and 



P(m>(^ 



>>=fe|^=m) 

1 -Pd(l -Pr), 

Pd)(l -Pr)p| 
Pr)p d 2m , 



m—k 
d ! 



fc = — m 

k = — 777 4 

otherwise, 



k = 777 

— 777 < k < 777 
k = —777 

otherwise. 



Appendix Note that the measure P( m ) differs from P only for state paths 
that reach beyond the states {±m}. The transition probabilities 
P* m for the channel P* m can now be defined as in Equation 
Q, but under the measure P/ m \. The stationarity of the 
channels P* m follows from the time-homogeneity of the 
2,i m ) process. 

Remark 3: Note that the sequence of sigma-algebras 
{^m}m>o where Sf m = cr(Z' m )) forms a filtration. The 
sequence of measures P( m ) as defined above seem to be 
defined only on the corresponding sigma-algebras £f m s for 
each 777 € Z + . However, we can extend these measures to 
the sigma-algebra 53 as in Appendix [X] and will henceforth 
consider P( m ) '■ & l— > [0, 1] for each m € Z + . □ 

The lemma below shows that for a fixed m G Z + , the FSC 

4.6]. 



P* 



is an indecomposable FSC [23 



Lemma 33 fP* m Indecomposable): The FSC P* m is in- 
decomposable for every m € Z + for (pcbPr) € (0, l) 2 . 

Proof: Fix m G Z + . We need to make a couple of 
modifications to put the channels {P* m } n >i in the parlance 
of discrete FSCs. First, we set 



= X 



-m-zy 



= X 



-z) 



for i G [n]. 
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Note that z\ m) = m + zH. G [0 : 2m], and hence the we have 
channel producing Y^?' is "causal". Let the "state" of 
the channel P* rn at time % g [n] be defined as 

W\ m) = (X [l _ 2m ..^ 1] Jt ) ) e X 2m x [0 : 2m], 



17^;^) -/(x w; y^|^)l 

< log 2 ((2m- 
= 2mlog 2 |X| 



where we set Xi — for i £ [n] . Note that we need to redefine 
the state of the channel in this case to keep the factorization 



p {m) <# m \w$\x it wl m) ) 



Therefore 

C*(m) 



lim 



sup 



l)|X| 2m 
f log 2 (2m 



-KX^YfflW^) 



!)• 



P {m) {Y} m) \Xi, Wl m> ) ■ P {m) {W^\Xi,Wy n) )- From (23] Theorem 4.6.4], the quantity on the right hand 



r (m) 



Since Z^ m%> is a finitely delayed, finitely shifted version of 
Z( m \ and because Z^ m ' is an irreducible, aperiodic Markov 
chain under the measure P( m ) |24, Chapter 1] as long as 
(pd,p r ) G (0, l) 2 , so is Z^ m \ In particular, we have that for 
every i > 2m, 



mm P( m )(Z, 



z') > V z' G [0 : 2m]. 



This implies that for i = 2m and x G X, by choosing w 

(£[0:2771-1]! z) for any z G [0 : 2m], we see that 



P(m> (^ 2 ( : i) = w\x = x, wr =w')>o 

for every w' G X 2 " 1 x [0 : 2m]. From (23) Theorem 4.6.3], 
we have the desired result. ■ 

Remark 4: Note that in the description of the causal channel 
in the proof above, we have discarded the part of the output 



7^ W( m ) 



Y 



(m) 



, by considering the causal output y£T\ This 

] L n J 

will however not matter in the estimation of the information 
rate since 



Wi-m+l-.Ni, '] 



< -I(X M ;Y(™1, ) - -I(X M -Y, { ^) < 
~ n V W WL m) ] J n y [nJ ' W ' ~ n' 

and since m G Z + is fixed, the rates are the same in the limit 
as n goes to infinity. □ 

Corollary 34 (Capacity of FSC P* m ): For m G 1 + and 
the sequence of FSCs {P£ m } n >i with (pd,p r ) G (0, l) 2 , the 
capacity is given by 



1 



C*(m)= lim sup -I{X [n] ;Y^ m) ) 

= lim sup 7* 

«->°°P <m> (x w ) ' 



Proof: From Lemma 33 and Remark |4] we have C*(m) 
as defined in the statement can be written as 



1 



C*(m)=lim sup -J(X [n] ;Y£ 



n ^°°p (m >(x H ) " 



Now since 



(m)N 



(^^r^ + ^N;^!^) 

7(^5^1^), 



(m), w/ (m)x 



Set = by convention. 



side of the above equality exists and is the capacity of the 
indecomposable FSCs {P£ m }n>i. ■ 

Corollary 35 (Capacity of ' P* m ): For the FSCs 
{P* m }„>i, the capacity C*(m) can be written [22 1 



as 



C*(m)=sup lim ±I(X [n y,Y^ ) 



sup lim I* 



= supT^(m) 

X 



where the supremum is over all stationary, ergodic input 
sources X. ■ 

From Lemma|33] since {P* m } n >i are indecomposable FSCs, 
we have from [25] that 



log 2 P(m 



( X [n] 



lim 



H(X [n]) Y^l h ) 



u(x,y 



log 



= n(y 



H{Y {m l, 



(m)\ 



as n 



oo a.s., where the entropies are calculated with respect 
Therefore 



to the measure Pi m ) 



I x (m)=H(X)+H(y( m) )~H(X,yt m) ) 

can be estimated numerically using the forward passes of the 
BCJR algorithm |26j to estimate H(X, j/ m )) and ?i(^ (m) ), 
as in (27), (28] . Moreover, optimizing Markov input sources 
numerically is possible (29) , |30| for these FSCs. 

In Fig. [3] we plot the SIRs, C* ud (m), for the indecompos- 
able FSCs {P* lm }n>i obtained through numerical simula- 
tions for 1 < m < 8 and p<j = Pr = V G [0, 5]. The value of n 
used for the estimation was 5 x 10 5 . The error in estimation 
is consequently upper bounded by 0.15%. 

A couple of observations are worthwhile noting. First, 
the SIRs {C* ud (m)} m >o are non-increasing. This hints at a 
total ordering of the FSCs {P* TO } m >o with respect to the 
information rates similar to what we had in Corollary [29] 
Second, we see that for small values of p, the SIRs get bunched 
up as m increases, i.e., the SIRs C* ud (m) converge quickly, 
so that we have a good estimate of 



for p close to 0. 



C? ud (oo) 4 lim C* ud (m) 

rn— > 00 
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Fig. 3. SIR estimates for the FSCs {P^. m }n>i with Pd = Pr = P 6 [0, j] 
for different m values are shown in solid lines. The lower bound on the 
capacity of the SDRC from Corollary |12| is also shown as the dashed line. 



Proposition 36: For n G N, we have 



lim inf /* 



I-n, 



Thus, 



C = sup inf lim inf I* 



,v 



n>l m— foo 



Proof: For a fixed n G N, since we have that 



(m) 



P(X[ n i, V[jv„]) asm4oo for every 



G S, P {m) (X [n] ,Y^ m) ) converges to P(X [n] , Y [Nn] ) in 
total variation as m — > oo. Consequently, from [31 Corollary 
1'], we have the desired result. ■ 

Remark 5: We conjecture that 7* — > /* = /„ as 
m — > oo. From |3l] Corollary 1'], one needs to show uniform 

for 



integrability of the information densities i(X[ n ],Y 



(m) 



the conjecture to be true. Alternatively, if the sequence of 
channels {P* m } m >o is totally ordered for every n G N with 
respect to the mutual information rates, as was the case for the 
sequence {Pj, m } m >o (cf. Corollary I29X i.e., if {/* m } m >o 



is a non-increasing sequence for every n G N, then we know 
that 



■n— >oo ' 



1* 

n> 



and from Proposition 36 I* m I I n follows. Unfortu- 
nately, we are not able to show this monotonicity in the 
sequence {I* m } m >o as we argued in the case of the sequence 
{At m }m>o- Although the refinedness property of the r pro- 
cess (cf. Lemma |27) still holds, the different measures P( m ) 



being used for each m G Z + do not allow us to generalize 



the result of Corollary [29] However, Fig. [3] provides sufficient 
empirical evidence for this monotonicity conjecture. □ 



C. Approximating Channels for the SDRC 

In this subsection, we consider the SDRC, i.e., the case 
when pd = p r = p G [0, 1). This channel is of interest since in 
practice, systems prone to mis-synchronization are usually not 
biased to produce more deletions or replications. For the case 
of the SDRC, we can fix to to be a function of n satisfying 
a simple condition and define a sequence of approximating 
channels. 

Lemma 37: For the SDRC, for every n G N, let m G N. 
Then, 



( N i m) \ 
( max I Zi I > to J 



P( max \Zi\ > to) =0 



/n + to \ 

V TO 2 / 



We relegate the proof to Appendix XI The significance of 
the above result can be seen by noticing that, for the SDRC, 
the probability (under measure P or P/ m \) with which the 
approximating channels introduced in the previous two sub- 
sections differ from the actual channel can be made arbitrarily 
small by setting m(n) = u)(y/n), i.e., lim n ^. 0o m ^ = 00, 
and choosing a large enough n. For the so-chosen sequence 
of approximating channels, we can conclude that the limiting 
channel characterizes the SDRC from the following result 



whose proof is left to Appendix XII 



Proposition 38 (Approximating SDRC): For the SDRC, 



I 



x 



lim /„ = lim inf I* {n) 

77.— >CJO 77.— VOO ' V ' 



where m(n) = u){^/n), for stationary, ergodic input process 
X. m 

The channels {Pf lm }m>o give us a wa y to approach the 
problems of optimizing input distributions as well as designing 
coding schemes for the SDRC. We can optimize the inputs 
of P* m , starting with small values of m, under some input 
assumptions, e.g., for fixed-order Markov inputs (29), |30|. 
Note that the numerical estimation of I* m is possible (as 
described in the previous subsection) only when m < n, since 



setting the channels as indecomposable FSCs (cf. Lemma 33 1 
is possible only in this case. Moreover, for a good estimate 
of the information rate, we will require to <C n. For the 
SDRC, Proposition 38 allows us to consider some P* m i n \, 
where m(n) is both Lu[y/n) as well as o(n), for which a good 
estimate of the information rate /* , > can be obtained. Note 

71,777(77) 

that due to the lack of a result analogous to Lemma 37 in 
the case of a general DRC for m < n, generalizing these 
arguments when pd 7^ p r is not completely justified. 

Starting with some small values of to, we expect that the 
information rates and optimal distributions quickly converge 
(in to), giving us a way to characterize optimal inputs for the 
SDRC P n . For small values of p, as in Fig. [3] the information 
rates for the SDRC can be characterized numerically for 
moderate values of to (much smaller than uj(y/n) guaranteed 
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by Lemma [37) . For optimizing the input distribution for an 
approximation P* m , we can start with optimizing inputs 
that are /i th -order Markov processes, for n > 1. As was 
observecQ in J32) , the convergence of optimal information 
rates as a function of the order \i of the input Markov process 
is expected to be rapid. The authors in f32| hypothesized that 
this convergence was exponential in /j,. Similar "diminishing 
returns" on increasing /i has also been observed by others 
p9) , pO) . We think that a similar rapid convergence of 
^,m(*M/.) to C n{X* Mll ) also holds for to, where I* n . m {X* M ^) 
is the optimal information rate achieved by a /i th -order Markov 
input process on the FSC P* m and C^XJ^ ) is the optimal 
information rate achieved by a /i th -order Markov input process 
on the SDRC P„. We use the generalized Blahut-Arimoto 
algorithm presented in OOf to evaluate I* m {Xj^S) for some 
small values of to ana/x. Figure [4] plots these estimates, 
which illustrates the aformentioned observations. Note that it 



fflplots good estimates of I* , n * for such channels, and since 
the values of n chosen are large, that they indeed represent 
good estimates of the Markov capacity of the SDRC. 

Apart from the above advantage of facilitating numerical 
estimation of information rates, the approximating channels 
P* m have another important advantage. This is that since 
they have immediate factor-graph interpretations, there is a 
possibility of constructing sparse graph-based coding schemes 
and decoding over the joint graphical model representing the 
channels as well as the codes, as was done for joint detection 
and decoding of LDPC codes on partial response channels 
[34]. Instead of trying to build codes for the SDRC P„, the 
problem can be reduced to designing good codes and efficient 
decoding schemes for the FSCs P* m for small values of to. 
For small deletion-replication probabilities p, which is the case 
of interest in practice, we can expect these codes to perform 
well for the SDRC P„ as well. 




Fig. 4. Numerical estimates of /* m (A'j^ )fl ) for m = 1,2 and 3, and 
fj, = 2m (solid lines). For the case where m = 3, fi = 6, only the estimates 
for small deletion-replication probabilities have been evaluated. We chose 
n = 10 6 for m = 1, 2 and n = 10 5 for m = 3. The smaller value of n in 
the case of m = 3 was chosen for computational convenience. Also shown 
(in dashed lines) for comparison are the corresponding estimates of the SIR 
(fi = 0) from Figure [3] 



is clear from the plots above that Bernoulli equiprobable inputs 
achieve rates (SIR) comparable to higher-order Markov inputs 
for small proababilities of deletion and replication. However, 
unlike ISI channels ]33| , it is not clear that Markov sources of 
increasing orders achieve capacity. It is also evident from the 
figure that for small values of the deletion-replication proba- 
bilities, the information rates seem to converge as to increases 
even for small values of m (m = 3). This suggests that Figure 

'Although the validity of the bounds in |32| is unclear (See, e.g., |lu) ), 
the rapid convergence of information rates as a function of the order fi of the 
input Markov process is expected to be true. 



VI. Generalizations 

In this section, we discuss different scenarios that can be 
modeled using the channel model introduced in Section III 
with appropriate modifications. Wherever possible, methods 
for information theoretic analysis for these cases through 
generalizations of the channel model presented in this paper 
are highlighted. 

A. Channels introducing Random Insertions 

The channel model of Equation |6) allows us to handle 
deletions as well as replications. However, the class of SECs 
that introduce random insertions cannot be written in the form 
of Equation (rol. A suitable modification for our model in this 
scenario is to let 



Yi = Xi 



■Zi 



t{Z i =Z i - 1 +V)^ 

where X = Y = V = {0, 1}, and V = {V*}i>i is a 
Bernoulli sequence with parameter / (for "flip"). This means 
that the probability of a random insertion is fp r and that of a 
replication is (1 — f)p r . Note that this can be easily generalized 
to any finite sets X, and arbitrary sets Y and V (with an 
appropriate notion of the addition operation "0"). Analysis of 
this channel is, however, more complicated than analyzing the 
DRC itself due to the cascaded additive noise channel which 
also depends on the "shared" state process Z. However, when 
the channel produces deletions, replications, random insertions 
as well as substitutions, we can write 

which is just a cascade of the DRC and an additive noise 
channel. In the binary setting, this corresponds to a channel 
that deletes a bit x with probability p$, or inserts a sequence y 
with probability (1 — Pd)(l — p r )p^ 1 f w (V+ x )_ This implies 
that the substitution error probability for a bit is given by 

(1-Pd)(l-Pr)/. 

The capacity (or the information rate achievable by a given 
input process) and coding for a cascade of binary, memoryless 
channels without synchronization errors has been studied in, 
e.g., [ 3 5 1 — [ 3 8 1 . Some lower bounds on the capacity of a 
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cascade of a BDC and an additive noise channel were given in 
[16 1, |39|. The possibility of extending these results to general 
memoryless SECs using the model presented here remains a 
problem worth exploring. 

B. SECs with Memory 

An SEC with memory is defined as in Definition [T] with 
the only difference being that the transition probabilities 
Qn(y[ n }\ x ln]) do not factorize as products of individual prob- 
abilities q^y^Xi), As an example, one could think of an SEC 
where having a deletion for a symbol influences the likelihood 
of the following symbol being deleted, i.e., a channel that in- 
troduces a burst of deletions. Similarly, channels that introduce 
a limited number of deletions in every input subsequence of 
a certain length could also occur in practice. These have been 
studied under the name of segmented deletion channels |40|, 
[41 1 . Note that the definition of the segmented deletion channel 
is slightly different in the references cited, where it is assumed 
that the input is divided into blocks of a certain length and at 
most one deletion occurs within each block. Our definition is 
more general and corresponds closer to reality. 

The channel model in Equation (|6]i generalizes readily 
to the case of DRC with memory. Consider the Z process 
to be a non-increasing (so that only deletions occur), time- 
homogeneous, shift-invariant Markov process of order z > 2 
such that 



P(Zi = Zi\Z[i 



> 



only for ^r— z: i— i] such that z;_ z = Z{ 



H-j+i 



for some 1 < j < z, and z. 



z i-j + l 



> 
< 



1. Then, clearly at most one deletion occurs for every input 
subsequence of length z. The model in Equation |6]i will then 
correspond to a segmented deletion channel where no more 
than 1 deletion occurs for every z input symbols. Similarly, we 
can model other DRC with memory by suitably considering 
the Z process to be a Markov process of some order with 
specific transitions occuring with non-zero probability. 

Although we have let z > 2, not all second-order Markov 
processes Z result in SECs with memory. One example worth 
noting is when the Z process is non-decreasing with incre- 
ments of at most 1, and is such that two consecutive increments 
occur with probability 0. This results in a replication channel 
where each symbol is transmitted noiselessly and possibly 
replicated once — this is exactly the elementary sticky channel 
introduced in AS), which is a memoryless SEC. We will refer 
to such channels that introduce a bounded number of inserted 
symbols per input symbol as bounded, memoryless SECs. This 
particular channel has also been studied in (T6) , where some 
analytical lower bounds on the capacity were given. Another 
example where z = 2 does not result in an SEC with memory 
but is a bounded, memoryless SEC is Gallager's model fT) of 
the insertion-deletion channel. Some analytical lower bounds 
for the capacity of this channel (without deletions) were 
given in [16|. Achievable rates for a bounded, memoryless 
SEC were studied in [42], and those for a cascade of a 
bounded, memoryless SEC with an inter-symbol interference 
(ISI) channel in (43). Some bounds on the capacity of a 



bounded, memoryless SEC with substitution errors were given 
in (44). 

Note that the channel coding theorem for SECs with 
memory has not been established. The various works on the 
"capacity" of such channels is an indication of such SECs 
occurring widely in practice. Establishing the channel coding 
theorem for SECs with memory is, therefore, important both 
for the theory and in practice. For SECs with memory, since 
the channel model Q will have the transition probabilities 
that still factorize as in Equation <|5j (with the channel state 
transition probabilities replaced by the higher-order transition 
probabilities), it is more amenable to analysis and could 
potentially be used to establish the channel coding theorem. 

C. Jitter, Bit-shift and Grain-error Channels 

Channels that consider mis-synchronization due to jitter 
or bit-shifts have been studied in the context of magnetic 
recording and constrained coding (45)-(47). These represent 
a variant of the general model of the DRC presented in the 
present paper. In particular, they are characterized by a Z 
process where each valid state path z £ Z has increments and 
decrements of size at most 1, and the transition probabilities 
are data-dependent. The zeros and ones in the input correspond 
to the absence and presence, respectively, of a transition in the 
signal. Thus, the presence of a transition cannot be deleted, 
i.e., a 1 in the input stream cannot be deleted, whereas the 
0s can be deleted or replicated (at most once). The authors of 
p3) gave bounds on the capacity and the zero-error capacity of 
bit-shift channels and also present some bounds on achievable 
rates over a concatenation of the bit-shift channel with a binary 
symmetric channel. Similar analysis was performed in [46 1 for 
discrete and continuous channels with timing jitter. Numerical 
upper and lower bounds on the capacity of a binary channel 
with jitter where transitions could "cancel" each other were 



given in |47|. 

Another class of channels that resemble these channels are 
the "paired" insertion-deletion channels studied in the context 
of bit-patterned media recording (48). Here, the channel is 



similar to the approximating FSC given in Section V-B with 
m — 1. In (48), the authors give bounds on the capacity 
and the zero-error capacity of the channel for varying sizes 
of the state space. A further specialization of this channel is 
the one-dimensional graunlar media recording channel. This 
has also been studied in [49 1, where some bounds on capacity 
and coding constructions have been proposed. 

D. Permuting and Trapdoor Channels 

The trapdoor channel introduced by Blackwell (See 1 50 
§7.1]) is a channel where the input stream is fed to a buffer at 
the same rate as symbols from within the buffer are randomly 
drawn as the output stream. Using our model, we can define 
the trapdoor channel as follows. The multiset of indices of the 
buffer contents at time i > 1 is denoted as £?,; = {fti, • • • , 
which is of size b. We initialize 

Bx={0,--- ,0,1} 

h-l 
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and define the output at the z th instant as Yi = Xr t for i £ [n], 
where Tj has the distribution P(r, = fa) = |, 1 < j < b. The 
buffer multiset is updated as B i+1 = \ {Ti} U {i + 1}. In 
this case, a further simplification of the channel model might 
be more useful since the channel depends not on the indices 
of the inputs in the buffer, but on the type fl8] §12.1] of the 
buffer contents at any time. This channel was generalized to 
define permuting channels in [51 1. 

Although the trapdoor channel is described easily, its ca- 
pacity, even in the simplest case of |X| = 2 and 6 = 2, 
has been an open problem ever since its introduction. In 
[52 1, the authors considered coding schemes for certain non- 
probabilistic models of the trapdoor channel. The capacity of 
the probabilistic trapdoor channel when |X| = 2 and b = 2 is 
known to satisfy (53) 



1 



< C(|X| = 2,b = 2) < log ; 



1 + V5 



0.694241914. 



It is worthwhile to explore the possibility of obtaining better 
bounds on the capacity of the trapdoor and permuting channels 
using the model presented in this paper. 



E. Molecular Communication and Chemical Channels 

A simple model for molecular communication or chemical 
channels is as follows. The channel state Zi at time instant i 
is a random variable on the alphabet {0} U [m] and represents 
the delay introduced to the input at time i. The output at time 
i is given as 



Y 



5> 



{Z,_ z =z}, 



i.e., the output is the sum of all the channel inputs that arrive 
at time i. This channel was studied in [54 1 as a delay selector 
channel and a lower bound on the capacity was given assuming 
that the state process is i.i.d.. In general, the state process can 
be modeled as a Markov process, and the channel might be 
amenable to a similar analysis as presented here. 



F. Timing Channels 

There is a link between discrete timing channels |55|, where 
information is communicated not only in the signals but also 
in the timing of the signals and the randomness is in the arrival 
times of the signals, and "good" transmission sequences for 
SECs. This is in the sense that a information-bearing trans- 
mission sequence for an SEC must not only be able to carry 
information within the sequence, but also contain information 
in the ordering of the symbols within the sequence, such that 
even in the presence of synchronization errors, the information 
about the symbol ordering is not completely lost. That is to 
say that the sequences Xi n ] must be such that under limited 
number of synchronization errors, the received sequence Ytjyi 
must convey adequate information about the state sequence 
Zw n ]- Therefore, it might be of importance to study whether 
methods of coding over timing can be used to obtain efficient 
codes and decoding schemes for the SECs. 



VII. Conclusions 

We introduced a new channel model for a class of SECs 
which formulated the SEC as a channel with states. This 
allowed us to obtain analytical lower bounds for the capacity 
of SECs with only deletions or only replications. For the case 
of the BDC, we were able to write the SIR in terms of subse- 
quence weights of binary sequences. Subsequence weights are 
known to be a quantity of interest in the maximum-likelihood 
decoding of sequences for the BDC (cf. Equation (flT)). 
Moreover, it is clear from Equation (JTTJ that the dependence 
of information rates for the BDC on the input statistics only 
appears in the term Sjm , whereas the subsequence weights 
influence H(x) independently of the input statistics. Thus, our 
result establishes a natural link between the capacity of the 
BDC and the metric relevant for ML decoding. We were also 
able to obtain lower bounds on the capacity of the BDC that 
are known to be tight for small deletion probabilities. For the 
BRC, we were able to exactly characterize the Markov-1 rate, 
which is, to the best of our knowledge, the first analytic lower 
bound on the capacity of the BRC. In doing so, we were able 
to disprove the conjecture that the capacity of SECs is a convex 
function of the channel parameters, at least in the case of the 
BRC. 

For the case of an SEC with deletions and replications, we 
were able to provide a sequence of approximating FSCs that 
are totally ordered with respect to the mutual information rates 
achievable, and therefore, with respect to capacities. These 
approximating FSCs were shown to be such that the mutual 
information rate achievable for the SEC was equal to the limit 
of the mutual information rates achievable for the sequence of 
FSCs. To obtain numerical estimates of achievable rates on the 
DRC, we defined another sequence of indecomposable FSCs. 
Computing the mutual information rates for this sequence of 
FSCs allows us to relate the mutual information rate for the 
DRC to the limiting value of the mutual information rates of 
the sequence. For the particular case of the SDRC, we were 
able to show a stronger form of convergence of these mutual 
information rates. 

The formulation in this paper not only allows us to get 
estimates of mutual information rates achievable on SECs 
but also gives some insight into possible code constructions 
and decoding schemes for such channels. The approximations 
introduced for the DRC gives us a natural way to reduce these 
problems. One would therefore obtain progressively better 
performing codes for the DRC by designing good codes for 
the sequence of approximating FSCs. We expect that for a 
small values of the deletion-replication probability, a code 
constructed for an approximation with a moderate value of m 
will perform well over the DRC as well. Some coding schemes 
for special cases of the FSCs (with m = 1) have been known in 



various contexts (See Section VI-C I. Extending these schemes 



to better approximations (larger m values) will prove crucial 
in designing good codes for the DRC. We emphasize that 
although the present paper considers only binary SECs, the 
results extend naturally to the case of larger finite alphabets. 
The expressions for information rates will perhaps become 
more complicated, but the methods to arrive at their bounds 
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or numerical estimates remain unchanged. 

The present formulation of the SECs also allows us to make 
the following remarks on the BDC. 

• In fT7) , the authors conjectured that the capacity of the 
BDC has a Taylor-like series expansion. We see from 
Theorem |2Q| that this is true for the SIR of the BDC. We 
expect that the capacity also has a similar formulation. 

• The capacity of a general SEC might not be convex in 
the channel parameters (See Remark |2j. It was shown in 
fl9) that the capacity Cbdc of the BDC satisfies 

. , C BDC (p) ,. Cbdc(p) 

mi = lim . 

pe[0,l] 1 — p p-+i 1 — p 

It is therefore expected that Cbdc (p) is convex in p. From 
Theorem |o] we see that the SIR Cg$ c of the BDC can 



be written as 



y-^fiud -i 

^BDC — 1 



-p-h 2 (p) 

V 

— I lim lim > f(i,m,p) 



m=o 



where f(i,m,p) = ipi. m p m (l — p) 1 is non-convex in p 
for m > 1. It is interesting to see if this double limit 
turns out to be convex despite 



being non-convex for every v > 1. Extending this to the 
case of the capacity Cbdc is a lso of interest. 
In order to obtain bounds for the capacity of a BDC for 
p close to 1, one might typically consider the case where 
all but one (or a few) symbols are lost. The lower bound 



Z?2 presented here (cf. Lemma 15 1 corresponds to this 
situation. However, since we considered this bound for a 
first-order Markov input, the bounds we obtained didn't 
prove to be useful for p close to 1. It might therefore 
be of interest to generalize this bound for a high-order 
Markov input which might give us a strictly positive (and 
thereby non-trivial) achievable rate. 

Appendix I 
Proof of Lemma[6] 

(i) This is true since p r < 1. 

(ii) Since pd < 1- 

(iii) Notice that, for each n € N, we can write 




P(Ai = *) 



5 = 



p*>*t\ s>i. 



From the strong law of large numbers (SLLN), we 
therefore have ^ — > E(Ai) a.s. as n — >• oo. We also 
have N n — > oo a.s. as n —> oo from point (ii) above. 



Therefore, -ff^ — > E(Ai) a.s. as n — > oo. Further, by 
definition, we have < n < rjv B +i> i-e-, 



"jv„ < r Wn+1 (N n + l' 



N n ~ N n ~ N, 



n + iv N n r 



Thus ^ -> 



i _ i-Pd 

E(Ai) l-p r 



a.s. as n — > oo. 



Appendix II 
Proof of Lemma[7] 



(i) By definition, Z is a first-order Markov chain. Time- 
homogeneity implies that 



P(Zi|Zi_i) = P(Zi|Z ) V i > 1. 



This is true for the state process Z from the definition 
since the transition probabilities in Equation <jvj do not 
depend on the time index i. Shift-invariance implies 



P(Zi = zi\Z Q = z ) = P(Zi = z\ - z Q \Z Q = 0). 



This is true because the state transition probabilities in 

|7| depend only on the difference zi — Zi-\. 

The r process inherits these properties from Z through 

the bijection £ : Z™ n- Z n , where with some abuse of 

notation, we write T [n] = C(Z[n]) = (C(Zi)),i € [n], 

with Ti = C,{Zi) = i- Zi,i G [n], VneN. 

The irreducibility and aperiodicity of the Z process 

follow from the definition. 

(ii) Note that from Equation Q, Z i+ i < Zi + 1 a.s. for every 
i > 0. Hence ^i+j < 2j + j a.s. for every i > 0, j > 0. 
Since Tj = i — Zi, we have Ti+j = i + j — Zi + j > 
i + j — Zi — j = Ti with probability 1. 

(iii) From the bijection £ (See point (i) above) and Appendix 
H] we have 

= ^(Pr) + Z —h 2 (pd)- 



Hence 



i=l i=l 

= n(h 2 {pr) + T — — /i2(Pd)Y 
V I — Pa ' 
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Appendix III 
Proof of Proposition[To1 

Let Zq — and consider semi-infinite input, state and output 
processes. We first note that V k, I € N, k < I, 

P(Y [k:l] =y [ka] \Z k _ 1 =0) 

= H P ( r [fc:i] = 7[k:i]' X 7[*:ll = V[k:l]Wk-l =k-l) 



Appendix V 
Special Cases of D,- ud 



7[*:(] 



- E P ( T lk:l] = 7[fc:i] - z ^7[ fc :! 

7[*:J] 



7[fe:i] _ 



] — V[k:l 

k — 1- 



-- 



= E p ( r iM = 

7[*:l] 

r fc _i = k - 1 - z) 

= P(Y[k:l] = 2/[fe:/]l r fe-l = k-l-Z) 

= P{Y[kd] = y\k,i]\Zk-i =z) V z < k- 1. 

Here, (a) follows from the shift-invariance of P (See Lemma 
|7| and (6) from the stationarity of X. Therefore, we have 

P(Y [k] = y [k] ) - £ P(^ - »)P(y[ fc ] - tf[ fc] |^o - *) 

= P0fa =V[k} \Zo = Q) 
= P(Y [k] =y [k] \T =0) 

= E p ( r w = 7[fe],^7 W = ywl r o = °) 

7[fc] 

- E P ( r ii+l:i+*] = 7[fc] ; ^ 7W = y[fc]l r j = 0) 

1[h\ 

= p ( Y u+i-.j+k] = y\k]\Zj = o) 

= P ( Y U+i:j+k] = V[k]) v j, kem 

where (c) follows form the time-homogeneity of F (Lemma 
|7J. The last equality above follows from the observation made 
in the beginning of the proof. 

Appendix IV 
Proof of Lemma[T31 

From Equation ([8), we can write 

(i + j)H i+j = H(Z [Ni+j] \X [i+j ] , Y [Ni+j] ) 

= • ff (^[W 1 ]l^[i+j] I y [Af s+j ]) 

+ H (Z[ Ni+1:Ni+j ] \X[ i+j ] , Y[ Ni+j ], Z[ Ni ]) 
> H(Z[ Ni ] \X [i+j] , Y[ Ni+j ] , Ni) 

+ H(Z[Ni+l:N i+J ] \X[i+j], Y[ Ni+j ], Z[ Ni ], Ni) 

( = H(Z[ N .] \X[{\ , YjjVj] ) 

+ H(Z[Ni+l:Ni+j]\X[i+l:i+j],Y[ Ni+ i.jf i+j ], ZjyJ 
ill, I jUj- 

In the above, the equality labeled (a) follows from the 
conditional independence of Z[jv 4 ] and £>[jVi+i:jv i+ -] on 
(X[i +1 .. i+ j],Y[ Ni+1:Ni+j ]) and (X [i] ,Y [N . ] ,Z[ N ._ 1] ), respec- 
tively, given From Fekete's Lemma [22 Appendix II], 
this superadditivity proves the claim. 



A. Bounds for D' 2 ud 

It is easy to see that when i = 2, Equation ( fTT| reduces to 

=H(Z 1 \Z 2 = -m,X,y) 
= log 2 (m + 1) - — ^ ^ fe' 



2 m+1 ^ V m + 1 

(15) 

where w(-) denotes Hamming weight. Hence 

D 2 " d = l-p- h 2 (p) + (1 - p) 3 £ (m + (16) 



m>0 



(2) 

For numerically estimating f)„i for large m, we can use 
the upper bound (56) 



m + 1 



m + 1 



< 2 („ l +i)/ l2 (-i rT ) / 
3 J ~ Y 27rj(m+ 1 - j) 

to get a further lower bouncj^jon fjm'- On the other hand, to 
obtain a looser analytic lower bound, we can bound 



1 \ i m + 1\ / ? 

%(^-^)<l-2- 



— y 



i(2) 



m + 1 



to get > log 2 (m + 1) — 1 + 2 m . This gives us 



DT>(i-py 



{2-pf 



^(m + iy n log 2 (m+l) 



h 2 (p). 



Unfortunately, it is not easy to evaluate the series 

oo 

i9 = mp" 1 ^ 1 log 2 m 



rn=1 



on the right hand side of the above inequality. Consider the 
function 

f(x) = xp x ^ 1 Inx, 

where m(-) is the natural logarithm. The m th term in the series 
can then be written as (log 2 e)f(m),m > 2. It turns out that 
for 

p < p* = exp ( - 1 ^ 2 2 ) ~ 0.294832606, 
we can lower bound the series d by the integral 

p OO rOO 

$ > log 2 e / f(x)dx = log 2 e / xp x ~ x \nxdx 

= f -P - (1 + In 2) - 2pln 2 - -Ei(2 In p)\ , 

mp \mp p ) 

6 We would like to get a lower bound on Z% since this will be a lower 
bound for C lud as well (cf. Equation {T5}). 
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where Ei(x) is the exponential integral function defined as 



EifaO 



t 



-dt, 



where (a) is true since € [0, 1] V i > 1, j > 
From the channel model, P(Xu-] — xu]\Zi = - 
since X JL Z and X is i.u.d., and 



which can be numerically evaluated to arbitrary accuracy 
through a Taylor series expansion. Therefore, for p < p* , 

4(1 -p) 3 



P(Y[i_i] = y[i-i]\x b 



-1) 



w y[i-i] ( x [i\) 



£>2 Ud > 



(2- P y 



-h 2 (p) 



+ (1 -Pf^^i + ln2) - 2pln2- -Ei(21np) 
mp Vmp 



For y\i—x) — x [i-i]~z [i _ 1] for some realization Z[i-i] with the 
boundary conditions zo = and z s ; = —1, 

H(Z x \Zi = -l,X[i] = X[i],Y[i-x] = y[i-x]) 

1 



With Sy m ' as given in Equation ( p"5j ), we can write 

£ 2 ud = 1 + p log 2 p - p log 2 (2e) + 0(p 2 ) 

for small p. This is loose compared to the bound obtained 
in JT21. This can be attributed to the fact that we evaluated 



H(Zx\Z 2 ,X,y) rather than to obtain L> 2 ud . In 

fact, this small-p series expansion of D 2 ld i s no better than 



that of the lower bound for the BDC in Corollary 12 We will 



improve this bound for small p in the next subsection. 

B. Bounds for the case when m = 1 

We now pursue the other case where ( ffT) is easy to evaluate. 
Instead of evaluating Z?J ud exactly, we can further lower bound 
it as follows. 

D? d = 1 - p - h 2 (p) + ( 1 - p) H (Zx I , x , y ) 



where "R>x(x[i],y[i-x\) i s the event that the single deletion 
occurred in the first run of x™ to result in yu-x\- To see this, 
let yu-x] represent a received word resulting from a single 
deletion upon transmission of xui. Consider the two mutually 
exclusive and exhaustive cases in this scenario: 

• The single deletion occurs in a run other than the first run 
of xui. In this case, there is no ambiguity that Z\ = 0, 
and the first run of yr,-_i] is either the same or larger 
tharFlthat of x\ 



= l-p-h 2 (p) + (l-p) 

E 

m>0 



P(Z< = -m)H{Zx\Zi 
l-p-h 2 (p) + (l-p) 



-m,x,y) 



»]• 

• The single deletion occurs in the first run of xuy 

- If rx(x\i]) = 1, there is no ambiguity that Z\ = — 1. 

- If rx(x\i]) > 1, the deleted symbol could be, with 
equal likelihood, one of the symbols comprising the 
first run of x^ . The uncertainty in Zx is h 2 ( n(z[-|) ) ' 

In both the above cases, the uncertainty can be written 
as h 2 ( — f 1 — v ) ■ 
Therefore, 



(2p(z i = -m)F(z 1 |z i = -m,Ar ) y)) w,i = *2^ ^ i 2 V^) r*io™-i]) 



= 1-P-/H(p) + (1-P)*f 

= s$° V j > 0,t > 1. 

We are essentially writing a series expansion for D\ ud and 
lower bounding it by the j th partial sum. Note that we can 
write 



p(z i = -j)H(z 1 \z i = -j,x,y) 



rx(x\i]). 

-- L »j 



= 4-^+^(1-^ 



(17) 



where ^>i >m was defined in Theorem 20 Clearly, the sequence 



{^j'^}j>o is non-decreasing, and, in turn, so is the sequence 



, i-2 . . 
3=1 



(18) 



{2$ ! ., ... Since *W 
py"4>i,x- Further, by definition 



'0i.o = 0, we have = p(l 



Df lA = lim 2# 



sup J) 



(0 



Thus for every j > 0, we can write 



/^iud _ D11T> n iud _ o^^c^/nW ^ 
°BDC 



sup 



sup sup 2) !■ 



sup sup D 

j>0 i>X 



(0 



> sup Hf = T)f d , 

i>l 



7 All terms in the series expansion are non-negative. 



We observe that ipn is non-decreasing in i, and converges 
exponentially to the value ipi w 1.288531275. From (JT7J and 
( fT~8] >, we have 

sf = l-p-/i 2 (p)+p(l-p) i+ > M 

= 1 +plog 2 p-plog 2 (2e) + 0(p 2 ). 

Since L>j ud = + E J > 2 P , ( 1 - we have 

^j ud = 1 + plog 2 p - plog 2 (2e) + ^ M p + 0(p 2 ). 

8 When the second run of Xui disappears. 
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Thus, from Equation ( p~2] > 



C^ c = l+ P \og 2 p-dp + 0( P 2 ) 



(19) 



where d = log 2 (2e) — ps 1.154163765. We note here 
that this is exactly the same bound obtained in fT2) with a 
completely different technique. Since this bound was shown 
to be tight for small p, we have that the capacity of the BDC 
itself is given by the above expression for small p. 

Discussion : The advantage in the evaluation of the above 
bound was that, when we restrict to the case of a single 
deletion, the ambiguity in the first channel state Z\ arises only 
when r 1 (x[ i ]) > 1, in which case the uncertainty is exactly 

h 2 ( — -r — v ] . This, however, is not true when there are 2 or 
more deletions, wherein we will have to count subsequence 
weights of sequences. 

C. Bounds for Markov-1 rates 

We can get similar bounds as in the previous two subsec- 
tions for first-order Markov inputs. Further, since the channel 
has no bias for the input symbols, we can confine our attention 
to symmetric Markov inputs. But these calculations will have 
to keep track of ascents and descents in sequences, and are 
therefore more tedious. 

Proceeding along the same lines as in Appendix |V-A 
can write for PpQ = x<5) l\Xi-i = x) = a e [0, 1], 



we 



rv/m _ 
u 2 — 



ax (h 2 (a) + (1 - pf J2 (m + l)p m £ m (a) 

x (1 — p) — h 2 (p), where 

m+l 

£ m (a) = log 2 (m+l) - V h 2 ( \ - )rj(a,j,m+ 1), 
* — ' \m + 1/ 

and ) is defined recursively as 

T}(a,j,m) = r) {a,j,m) +r)i(a,j,m) 
Vo(a,j,m) = (1 - a)r) (a,j,m - 1) 

+ arii(a,j,m- 1) 
Vi(aJ,m) = (1 - a)7?i(a,j - l,m- 1) 

+ arj (a, j - l,m- 1) 

with ?7fe(a, km, to) = |(1 — a) m_1 , r]k(pt, (1 — k)m, to) = 
and T]k(a,j, m) = V j ^ [to] for k € {0, 1}. 
Similarly, we can also evaluate 

Sf 1 = -h 2 (p) + (l-p)x 

max h 2 (a) + p- sup(l -p)'(qy)j(l - a) J_1 ft-2(-) 
«>i i=1 J 

+ l (l-a) l / l2 (i))". 

However, both D^ 41 and Sf 41 turn out to be better than their 
SIR counterparts by less than 2%. 

Discussion : Although first-order Markov inputs are ex- 
pected to perform better than i.u.d. inputs, we see that the 



bounds we obtained are almost the same in the two cases. 
This is because we are considering two special cases, the first 
when i = 2 wherein all but a single symbol were deleted, 
and the second when rn = 1 wherein a single symbol was 
deleted; and in these cases, a first-order Markov input is not 
significantly different than i.u.d. inputs. □ 

Appendix VI 
Proofs of Results for the BRC 

A. Proof of Corollary g?] 

We have from Proposition [16] and Lemma [T7] that 

H{Z x \Z 2 ,X,y) 



L-BRC — n 2 ~ 



h 2 (a) + 



1-p 



p+(l-a)(l-p) 



h, 



P 



l—p " Vp + (1 — a)(l — p) 

where we have used the expression for H(Z\\y) from the 
proof of Theorem 23 Observe that Z 2 E {0, 1, 2}, and among 
these possibilities, the only event wherein there is an ambiguity 
in the value of Z\ is when Z 2 = 1. Thus, we can see easily 
that H(Z!\Z 2 ,X,y) = 2p(l -p)(l - a). Hence 



tdMI 
K 2 



h 2 (a) + 2p(l - a) 
p + (l-a)(l-p) 



P 



1— p " \p + (1 — a) (1 — p) 

It can be shown that the optimal a in the above is given by 

1 



(l-p)(2 2 P + l)' 

Note that a* is always larger than |, and a* < 1 for p < p* 
where 

p* ps 0.734675821. 
Plugging this back in the expression for R^ 1 ends the proof. 



B. Proof of Corollary 25 



From Proposition [16] and Lemma [T7] for i.u.d. inputs, 



y-^fiud 



BRC 



1 _H(Zi\y) +su H{Z x \Z u X,y) 
1-p i>l 1-p 



1 



1+p 



h 



Vi + p^ 



2(1 -p) z Vl+p 

EL=o p (^ = m)H{Zi\Zi = m,X,y) 



+ sup 

2 1-p) 2 Vl+p 



1-P 



( £ 



i>i v " \m 

— m=u 



p" 1 (i-p)' 



i— 771 — 1 



x H(Z 1 \Z i =m,X,y) 

i - 1+ P h / 2 P 
2(l-p) ft2 Vl+p 

+ su V (i P (i- P y- 2 H(z 1 \z l = i,x,yj) +o( P 2 ), 
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where we have used the expression for H(Zi\y) from the 
proof of Theorem 23 for a = \. The last equality is true 



where 



since H(Zi\Zi = 0TX,y) = 0. 

As shown in the proof of Theorem 23 we can write 

H(z 1 \z i = i,x,y) 

= E[h 2 (P(Z 1 = 0\Zi = l,r?(x N ),r?M))]. 

Further, there is no ambiguity in Z\ if the single replication 
does not occur in the first run of x^. Therefore, for a first-order 
Markov input process, 

H(Z 1 \Z i = l,X,y) = E[h 2 (P(Z 1 = 0\Z, = 1, 

r 1 (x N )=l,r° 1 ( m ) = l + l))} 

i-l 

= £(l-a)'a 
i=i 

For a = |, we get 



1 I 

iH{z 1 \z i = i,x,y) = -Y,j l \og 2 i- 

i=i 



from Equation(|T8|. Thus, 

1+p 



n 



iud 
BRC 



1 



ho 



2(i- P y~*\i+ P ) 

sup (p(l - p) l ' 2 A 



= 1 +plog 2 J3 + log 2 (-)p 

+sup (p(i - p) i_2 vv 



0(p 2 ), 



o( P 2 ), 

= l+plog 2 p + rp + 0(p 2 ), 

where r = log 2 (f ) + Vi = 2 - d ss 0.845836235. As was the 
case for the BDC, we expect this to be a tight bound for the 
capacity for small p. 

Appendix VII 
Proof of Lemma|261 

From the definition of the Z^" 1 ) process in Equation {14) , 
it is clear that G {— to, — m + 1, • • • , — 1, 0, 1, 2, • • • , to} 
for every i € N, and that it is a Markov chain. It is therefore a 
finite Markov chain. The time-inhomogeneity follws by noting 
the transition probabilities between states, which can be easily 
shown to be given as follows. For —to < j < m, i > 1, 

P(Z^=k\Z^=j) 

'Pr, k=j + l 

—to < k < j 
k = —in 

0, otherwise. 
From the states {±to}, the transition probabilities are 

P(Z™=k\Z™=-m) 

!1 — p r p(i,m), k = —to 

p r p(i, to), k = —to + 1 

0, otherwise, 



(1 -p r )(l ~Pd)Pd , 



p(i,m) 



P(Z, 



i-l 



-m) 



P(Z(_i < -to) ' 
and for j such that P(Zj_i > to) > 0, 

P(zf n) = fc|z£"> = to) 

1 - Pd(l -p r )p(i,"i,pd), k = m 

(1 -Pd)(l -pr)Pd™ _fe p( i ' m '-Pd), -m < k < TO 
(1 -Pr)Pd m P( i > m >Pd), fc = -TO 

0, otherwise, 
where 

P(2»-i > m) 

Note that it is only transitions from the boundary states {±to} 
that have time-dependent probabilities. 

As was noted in the proofs of Lemmas [6] and [15] we can 
write the Z process as Z n — Y^t=i where {H,*}^>i is an 
i.i.d. process with 

(pr, e = i 

p(Si = = |(i-w)(i-Pr)Pd" e » ve<o 

[O, otherwise. 

_ Pi—Pd A v 



As noted before, from the SLLN, ^ -» EfSJ = ^ 
a.s. as n — > oo. Let us write 

VarU] - E[^] - EU] ^—^ v . 

From the central limit theorem (CLT), we have 



P(Z n > to) = P 
P(Z n < -to) = P 





- nx 


> 


m — nx\ 


V 


fnv 




■s/nv ) 




- nx 


< 


—to — nx 


V 






Jnv 



where 



A m-nx A ,. TO + nx 
t = lim = — and b = lim = — 



with t.belU {±oo}, and 



Q(ar) 



1 



e 2 du. 




> 2tt J x 

Writing m = lirrin^oo ^, we can say that when x > 0, 



P(Z< m > = to) = P(Z n > to) ' 
and when x < 0, 

P(Z< m ) = -to) = P(Z„ < -to) 

When x = 0, for to = o(y/n), 

P(4™)=m) 1 
P(4-)=-m)J 



1. 


m < 


-x 


1 

2 ' 


m = 


-x 


0, 


m > 


-x 
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Hence 

P(ZM e {±to}) 1 | for m = ° ( ^ When X * °' 

I for to = o{y/n) when x = 0. 

The result then follows by noting that x is equal (or not equal) 
to accordingly as is equal (or not equal, respectively) to 

Pr- 

Appendix VIII 
Proof of LemmaITTI 

We first note that although N n and N^n might themselves 
differ, both sets {T^j} and {r^ m) } are subsets of [n]U{0}. 

Therefore, assuming that all random variables X$ where i £ 
[n] are constants (in particular, we assume that these random 
variables are all equal to 0), we can consider the above sets of 
indices to be {r [JV ( m! „)]} and {T^ m ^} respectively, where 
we define 

N(m, n) 4 N n V N 7 [ m *> = m&x{N n , N™}. 

We have N(m, n) < oo a.s. for every n € N, to € Z + . 

Let S+ = inf{£ > : Z i+1 > to} and T+ = inf{i > : 
Zs++\+i < m}, where we define inf0 = oo. Then, {5+ + 
1, + 2, • • • , T±} n [N(m, n)] is the set of instances where 
Zi and z\ m ' differ for the first time as a result of Zi exceeding 
m. In this case, Z s + = m, Z s + +1 = to + 1 with probability 
1 from the definition of the ij process. Further, to + 1 < 
Z s + + j < m + j,j = 1,2, ••• , Tf a.s., implying for this 
range of js that 5+ — to < T s + +J - < + j — to — 1 a.s.. 
But, by definition, . = m, j = 0, 1, • • ■ ,T X + , and hence 

T ( + ) . = St +j - m. Thus, if we write Ut = {St,St + 
1, • • ■ , 5^ + 7\ + }, then we have 

{r^} c {rjj>} a. B .. 

Since Z s + = Z { ™ ] = m, T, < T s + = S+-m V i < 5^ a.s. 
from Lemma |tJ Similarly, since Z s + +T + +1 = -^g++T+ +1 — 
to, Ti > ^ s ++t++i > + l-m\f i> sf +T+ + 1 

a.s. from Lemma 7 It follows that the indices in {T^} \ 

cannot appear in {Tu} for any U C [N(m,nj\ \ U x . 
Using similar arguments, by recursively defining for i > 2 

S+ = inf {« > fi+ ! + l£ x : Z i+1 > to}, 
= inf{« > : Z s + +1+i < to}, 

and letting U+ = {S+ , S+ + 1, ■ ■ ■ , S+ + T+}, we can show 
that 

{r^} c {r£?} a. B .v * > i. 

Similarly, consider 

S = To = 0, 

= inf{i > + T! i l 1 : < -to} and 
Tr = inf{i > : Z s - +i > -to}, i > 1. 

Then, Z S -, T - = —to and = — to — 1 with proba- 

bility 1, Further, with probability 1, — to — j < Z s - +T -_. < 



-m - 1, j = 1,2, • • • ,Ti a.s., implying 5, + T { - j + 
to + 1 < r s - +T -_ j - < + Tf + m a.s.. By definition, 

T ™+T t --j = S r + *T - i + = 0, 1, - •• and 
consequently 

{r ur }c{r^ ) }a.s.Vz>i 

where Ur = {S~ ,S~ + I, - ■ ■ ,Sj~ + Tf}. As before, the 
missing indices cannot appear in for any U C [N(m, n)] \ 

Therefore, writing U± = \Jt>i( v t u u r)' 

we have 

{rv}c{r^}a. s . 

Let U° = [iV(m, n)l \ U . Since U ± consists of all indices 
i where Ti and T^' differ, we have from the above relation 
that almost surely, 

{r u± }u{r u o}c{r^ ) }u{r^ ) } 

{^[N(m,n)]} C {r[^ (m _ n)] } 

=>{T [ffll ]}n[ n ]c{rg B)] }n[4 

We are interested in the intersection in the last step above 
since only indices in the set [n] are indices of non-constant 
random variables. 

We can use an argument similar to the one above to show 
that V m e Z+, 

{rftjnMc^nM a.s.. 

Appendix IX 
Proof of Proposition[281 

We use the result from Lemma l27l Let us define 

s m = f| {? e s : {r [Nn] } n N c {rg m)] } n [ n ]}, and 

n > 1 

s m = fl {? € s : {r^,,} n N c {r[^ m)] } n [»]} 

n>l 

and let 

s* = f) (§ m n§ m ). 

rnGZ+ 

Clearly, P(S*) = 1, Then, confining the expectations over the 
set S*, 

= I§*(X [n] ;X r(m) \ x r lNn] ) > 0, 

where /§»(■) denotes the mutual information obtained after 
confining the expectations to the set §*. Similarly, we have 
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Appendix X 
Extending the Measures P( m ) to 38 
We will first assume that § = S x X S z and that 38 = 
3§ x x 38 z with 38 x = &(X) and 38 z = c(Z), i.e., the space 
(§, 28) is a product space. Since in our model X _L Z, there 
is no loss of generality in this assumption. 

By defining the stationary transition probabilities 

Pi 



(m) 
3 (m) 



(Zi|Z ) as in 
are well-defined 



Section V-B the 



measures 



over 'S 

{Z^))- l {z) ^{^§ z : Z("%) 
similarly Z" 1 ^) = {<;e§ z : Z(<r) 

Z^z) C (2W)-i( ? )Vi e I ±m 



= cr(Z( m )). Let 
z} for z £ Z± m , and 
z}. Then, clearly 



and 



z~ l (z) e (z^r 1 ^) g Sf m Vie Z ±m . 

Then, we define 

P^Z-^z)) = P (m) ((Z( m ))- 1 (z)) V z GZ ±m . (20) 
This will imply that for every z G Z± m , 

p (m> ((z(-))- 1 (?)\z- 1 (?)) = o. 

By definition, we also have for z G Z \ Z± m that 
(Z^ m ^) _1 (z) = so that the associated probability is 
zero under any measure P, P/ m \. We can now consider the 
space (§z,3§z,P(m)) to be obtained from (§.z,Sf m , P( m )) 



along with the definition pO| i and subsequent completion p7| 
§2.6.19]. 

By now defining P/ m \(X) = P(X) independent of to, we 
can extend the measure P/ TO \ to 38 — o-({X,Z}) for each 
to G Z+ as required. 

Appendix XI 
Proof of Lemma l37l 

As noted in the proof of Lemma |26j we have for every n £ 
N that Z n is the n th partial sum of the i.i.d. process {Sj}j>i. 
For the SDRC, we have E[Si] = x = and Var[3i] = v 2 r = 
< oo since p £ [0, 1). 

Let = a {{Zn}) C 38, the sigma-algebra generated 
by Z n , for every n £ N. Clearly, 5? n = cr({Sr n i}) so 
that {^ n }„>i is a filtration, and Z n £ S^ n by definition. 
Let Sf n f ~5? C ^ as n -> oo. Then for every n G N, 
Z„ G L 2 (S,^, P) since 



E[|Z„| 2 ] = E[Z 2 ] = e[ 



n 

E s 

n 



= E E ^ + E E E ^ 



n • Var[Si] 

n 

1-p 



n n 

^E E 

i=l j=l j/i 



E[3,]E[S, 



< oo. 



Further, E^.^] - E[Z„_x + S„j^_ a ] = 
Therefore, {Z n , ^ n } n >i is a martingale under the measure 
P. Consequently, {\Z n \, S^ n } n >i is a submartingale. 



Since |Z„| G £ 2 (§, P), from Doob's submartingale 
inequality |20 §14.6], we have 

E[|^| 2 ] 



P(max|Z l | > to) < 

z=l 



We have (cf. Section 
then follows by noting that 



V-Ak that Nl m) < n 



Jp 

1 — p) m 2 

to. The result 



Pf max \ZA > m) < P(max \ZA > to) 

V i—l J 2=1 



and the above result. The bound with respect to the measure 
P/ m \ is true because 

P (m) f d £ § : max \Z0)\ > to J 

= Ph?eS: max \Zi($)\>m 

from the definition of the measure P/ m \ (See Section 
Appendix |Xl). 



V-B 



and 



Appendix XII 
Proof of Proposition!^ 

We start with a small Lemma. 

Lemma 39: Let (T, si) be a measurable space, and let 
{Qn}n>i, Q all be probability measures on this space. Sup- 
pose that 

i) For every n > 1, there is a set B n G si such that 
Q„(A) = Q(A) for every A C B„, A G si. 

ii) Q(B n ) — > 1 as n -» oo. 

Then the measures Q„ converge in total variation to Q, i.e., 
Qn — -> Q as n — > oo. 

Proof: From ii), for every e > 0, there exists n'(e) £ N 
such that 

Q(Bn) > 1 - e V n > n'(e). 

From i), Q n (A n B„) = Q(A n B„) for every (»>l,Ae/ 
Therefore, for every e > 0, 

||Qn-Q|| =2 sup |Q n (A)-Q(A)| 



= 2 sup |Q„(A n B„) - Q(AnB^)| 
< 2e V n > n'(e). 



Hence Q„ — 
Note that 



Q as n — > oo. 



|^e§: max > mj 

l ^ where F 

P(X[„j, Y[jy n ]). From Lemma 



is the subset of § in ^ where P( m ) (-^[n] > Y ™(L) ) differs from 



37 



we have 



P(m(n)>On,m(n)) = P(®n,m(n)) 

as n — > oo, whenever m(n) = w(yjn). Consider henceforth 
that m(n) satisfies this condition. In Lemma 39 above, set 



T 



S, si 



% Qn 



(m(n)) 



and Q 



P. Note that 



although P( m („)) is only defined on ^ nim ( n ) (cf. Proposition 
30 1, we can extend it to 38 such that it agrees with the measure 
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P on every subset of B„ for each n > 1. Then for each n £ N, 



we see that by setting 



n.m{n) 



both conditions i) and 
ii) in Lemma 39 are satisfied. From this and [31, Corollary 
1'], we have the desired result. 
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